A Strategy for Selecting Classes of Symbols from Classes of Graphemes in HMM-Based Handwritten Word Recognition
نویسندگان
چکیده
This paper presents a new strategy for selecting classes of symbols from classes of graphemes in HMM-based handwritten word recognition from Brazilian legal amounts. This paper discusses features, graphemes and symbols, as our baseline system is based on a global approach in which the explicit segmentation of words into letters or pseudo-letters is avoided and HMM models are used. For this framework, the input data are the symbols of an alphabet based on graphemes extracted from the word images visible on the Hidden Markov Model. The idea is to introduce high-level concepts, such as perceptual features (loops, ascenders, descenders, concavities and convexities) and to provide fast and informative feedback about the information contained in each class of grapheme for symbol class selection. The paper presents an algorithm based on Mutual Information and HMM working in the same evaluation process. Finally, the experimental results demonstrate that it is possible to select from the “original” grapheme set (composed of 94 graphemes) an alphabet of symbols (composed of 29 symbols). We conclude that the discriminating power of the grapheme is very important for consolidating an alphabet of symbols. Key-words: Features, Mutual Information, HMM, Handwritten Word Recognition.
منابع مشابه
Holistic Farsi handwritten word recognition using gradient features
In this paper we address the issue of recognizing Farsi handwritten words. Two types of gradient features are extracted from a sliding vertical stripe which sweeps across a word image. These are directional and intensity gradient features. The feature vector extracted from each stripe is then coded using the Self Organizing Map (SOM). In this method each word is modeled using the discrete Hidde...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملOnline Recognition of Handwritten Korean and English Characters
In this study, an improved HMM based recognition model is proposed for online English and Korean handwritten characters. The pattern elements of the handwriting model are sub character strokes and ligatures. To deal with the problem of handwriting style variations, a modified Hierarchical Clustering approach is introduced to partition different writing styles into several classes. For each of t...
متن کاملMixture of Experts for Persian handwritten word recognition
This paper presents the results of Persian handwritten word recognition based on Mixture of Experts technique. In the basic form of ME the problem space is automatically divided into several subspaces for the experts, and the outputs of experts are combined by a gating network. In our proposed model, we used Mixture of Experts Multi Layered Perceptrons with Momentum term, in the classification ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004